AITopics

Country: Asia (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsFeb-13-2026, 19:52:06 GMT

Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization

Yujiao Shi, Liu Liu, Xin Yu, Hongdong Li

Existing deep methods overlook those appearance and geometric differences, and instead use a brute force training procedure, leading to inferior performance.

aerial image, artificial intelligence, machine learning, (19 more...)

Country:

North America > United States (0.06)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Neural Information Processing SystemsAug-20-2025, 00:22:19 GMT

Spatial-Aware Feature Aggregation for Image based Cross-View Geo-Localization

Yujiao Shi, Liu Liu, Xin Yu, Hongdong Li

One of the key reasons is the vast differences between the two view modalities, i.e.,

aerial image, correspondence, polar transform, (14 more...)

Country:

North America > United States (0.07)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
North America > Canada (0.04)
Asia > China (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Li, Yaxuan, Huang, Yewei, Gaudel, Bijay, Jafarnejadsani, Hamidreza, Englot, Brendan

CVD-SfM: A Cross-View Deep Front-end Structure-from-Motion System for Sparse Localization in Multi-Altitude Scenes

arXiv.org Artificial IntelligenceAug-14-2025

-- We present a novel multi-altitude camera pose estimation system, addressing the challenges of robust and accurate localization across varied altitudes when only considering sparse image input. The system effectively handles diverse environmental conditions and viewpoint variations by integrating the cross-view transformer, deep features, and structure-from-motion into a unified framework. T o benchmark our method and foster further research, we introduce two newly collected datasets specifically tailored for multi-altitude camera pose estimation; datasets of this nature remain rare in the current literature. The proposed framework has been validated through extensive comparative analyses on these datasets, demonstrating that our system achieves superior performance in both accuracy and robustness for multi-altitude sparse pose estimation tasks compared to existing solutions, making it well suited for real-world robotic applications such as aerial navigation, search and rescue, and automated inspection. I. INTRODUCTION Structure-from-motion (SfM) [1], [2], [3] has been receiving extensive attention in the field of computer vision and robotics; it is pivotal in various real-world applications, including autonomous navigation [4], rapid mapping after natural disasters for situational awareness, detailed preservation of historical landmarks [5], and immersive virtual reality (VR) experiences. As research progresses, it provides a cornerstone in achieving pose estimation and reconstruction, standing out as a particularly effective technique, specifically when input is limited to sparse images. While conventional SfM approaches deliver impressive results under conditions of abundant image overlap, they often struggle with sparse input captured at vastly different altitudes. In these scenarios, the drastic viewpoint differences limit shared visual features, making it difficult to establish reliable correspondences.

artificial intelligence, human computer interaction, machine learning, (17 more...)

2508.01936

Genre: Research Report (0.82)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Norman, Kalin, Mangelson, Joshua G.

Feature Geometry for Stereo Sidescan and Forward-looking Sonar

arXiv.org Artificial IntelligenceJul-9-2025

-- In this paper, we address stereo acoustic data fusion for marine robotics and propose a geometry-based method for projecting observed features from one sonar to another for a cross-modal stereo sonar setup that consists of both a forward-looking and a sidescan sonar . Our acoustic geometry for sidescan and forward-looking sonar is inspired by the epipolar geometry for stereo cameras, and we leverage relative pose information to project where an observed feature in one sonar image will be found in the image of another sonar . Additionally, we analyze how both the feature location relative to the sonar and the relative pose between the two sonars impact the projection. From simulated results, we identify desirable stereo configurations for applications in field robotics like feature correspondence and recovery of the 3D information of the feature. Field robotic applications, such as localization and mapping, in underwater environments face significant challenges due to the complex and dynamic nature of the marine domain.

artificial intelligence, information fusion, sonar, (17 more...)

2507.0541

Country: North America > United States (0.28)

Genre:

Overview (0.93)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

arXiv.org Artificial IntelligenceJul-1-2025

Event-based Stereo Visual-Inertial Odometry with Voxel Map

Zhang, Zhaoxing, Wang, Xiaoxiang, Zhang, Chengliang, Guo, Yangyang, Yuan, Zikang, Yang, Xin

The event camera, renowned for its high dynamic range and exceptional temporal resolution, is recognized as an important sensor for visual odometry. However, the inherent noise in event streams complicates the selection of high-quality map points, which critically determine the precision of state estimation. To address this challenge, we propose Voxel-ESVIO, an event-based stereo visual-inertial odometry system that utilizes voxel map management, which efficiently filter out high-quality 3D points. Specifically, our methodology utilizes voxel-based point selection and voxel-aware point management to collectively optimize the selection and updating of map points on a per-voxel basis. These synergistic strategies enable the efficient retrieval of noise-resilient map points with the highest observation likelihood in current frames, thereby ensureing the state estimation accuracy. Extensive evaluations on three public benchmarks demonstrate that our Voxel-ESVIO outperforms state-of-the-art methods in both accuracy and computational efficiency.

artificial intelligence, map point, odometry, (11 more...)

2506.23078

Country: North America > United States > Minnesota (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Robots (0.50)
Information Technology > Artificial Intelligence > Vision (0.47)
Information Technology > Sensing and Signal Processing > Image Processing (0.46)

arXiv.org Artificial IntelligenceJun-4-2025

Pro3D-Editor : A Progressive-Views Perspective for Consistent and Precise 3D Editing

Zheng, Yang, Huang, Mengqi, Chen, Nan, Mao, Zhendong

Text-guided 3D editing aims to precisely edit semantically relevant local 3D regions, which has significant potential for various practical applications ranging from 3D games to film production. Existing methods typically follow a view-indiscriminate paradigm: editing 2D views indiscriminately and projecting them back into 3D space. However, they overlook the different cross-view interdependencies, resulting in inconsistent multi-view editing. In this study, we argue that ideal consistent 3D editing can be achieved through a \textit{progressive-views paradigm}, which propagates editing semantics from the editing-salient view to other editing-sparse views. Specifically, we propose \textit{Pro3D-Editor}, a novel framework, which mainly includes Primary-view Sampler, Key-view Render, and Full-view Refiner. Primary-view Sampler dynamically samples and edits the most editing-salient view as the primary view. Key-view Render accurately propagates editing semantics from the primary view to other key views through its Mixture-of-View-Experts Low-Rank Adaption (MoVE-LoRA). Full-view Refiner edits and refines the 3D object based on the edited multi-views. Extensive experiments demonstrate that our method outperforms existing methods in editing accuracy and spatial consistency.

artificial intelligence, machine learning, natural language, (15 more...)

2506.00512

Genre: Research Report > New Finding (0.34)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Lee, Jongwon, Bretl, Timothy

GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting

arXiv.org Artificial IntelligenceMay-2-2025

GSFeatLoc: Visual Localization Using Feature Correspondence on 3D Gaussian Splatting Jongwon Lee 1 and Timothy Bretl 1 Abstract -- In this paper, we present a method for localizing a query image with respect to a precomputed 3D Gaussian Splatting (3DGS) scene representation. First, the method uses 3DGS to render a synthetic RGBD image at some initial pose estimate. Second, it establishes 2D-2D correspondences between the query image and this synthetic image. Third, it uses the depth map to lift the 2D-2D correspondences to 2D-3D correspondences and solves a perspective-n-point (PnP) problem to produce a final pose estimate. Results from evaluation across three existing datasets with 38 scenes and over 2,700 test images show that our method significantly reduces both inference time (by over two orders of magnitude, from more than 10 seconds to as fast as 0.1 seconds) and estimation error compared to baseline methods that use photometric loss minimization. Results also show that our method tolerates large errors in the initial pose estimate of up to 55 in rotation and 1.1 units in translation (normalized by scene scale), achieving final pose errors of less than 5 in rotation and 0.05 units in translation on 90% of images from the Synthetic NeRF and Mip-NeRF360 datasets and on 42% of images from the more challenging T anks and T emples dataset. I NTRODUCTION Visual localization is the process of determining the pose (position and orientation) of a query image with respect to a previously reconstructed scene (i.e., a map).

artificial intelligence, initial pose estimate, query image, (13 more...)

2504.20379

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Demetci, Pinar, Tran, Quang Huy, Redko, Ievgen, Singh, Ritambhara

Revisiting invariances and introducing priors in Gromov-Wasserstein distances

arXiv.org Artificial IntelligenceJul-19-2023

Gromov-Wasserstein distance has found many applications in machine learning due to its ability to compare measures across metric spaces and its invariance to isometric transformations. However, in certain applications, this invariance property can be too flexible, thus undesirable. Moreover, the Gromov-Wasserstein distance solely considers pairwise sample similarities in input datasets, disregarding the raw feature representations. We propose a new optimal transport-based distance, called Augmented Gromov-Wasserstein, that allows for some control over the level of rigidity to transformations. It also incorporates feature alignments, enabling us to better leverage prior knowledge on the input data for improved performance. We present theoretical insights into the proposed metric. We then demonstrate its usefulness for single-cell multi-omic alignment tasks and a transfer learning scenario in machine learning.

artificial intelligence, dataset, machine learning, (18 more...)

2307.10093

Country:

Europe > France (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
Africa > Togo (0.04)

Genre: Research Report (0.81)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Neural Information Processing SystemsApr-6-2023, 17:07:04 GMT

Feature Correspondence: A Markov Chain Monte Carlo Approach

When trying to recover 3D structure from a set of images, the most difficult problem is establishing the correspondence between the measurements. Most existing approaches assume that features can be tracked across frames, whereas methods that exploit rigidity constraints to facilitate matching do so only under restricted cam(cid:173) era motion. In this paper we propose a Bayesian approach that avoids the brittleness associated with singling out one "best" cor(cid:173) respondence, and instead consider the distribution over all possible correspondences. We treat both a fully Bayesian approach that yields a posterior distribution, and a MAP approach that makes use of EM to maximize this posterior. We show how Markov chain Monte Carlo methods can be used to implement these techniques in practice, and present experimental results on real data.

bayesian approach, feature correspondence, markov chain monte carlo approach

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)